Wav2vec‐MoE: An unsupervised pre‐training and adaptation method for multi‐accent ASR

نویسندگان

چکیده

In real life, either the subjective factors of speakers or objective environment degrades performance automatic speech recognition (ASR). This study focuses on one factors, accented speech, and attempts to build a multi-accent ASR system solve degradation caused by different accents, whose characteristic is low resource. To deal with challenge low-resource data styles, wav2vec-MoE (mixture experts) proposed adapt wav2vec 2.0 for ASR. wav2vec-MoE, domain MoE developed introducing pseudo-domain information in pre-training stage, where denotes collection varied same influence factors. The trained two strategies according mismatch assessment between unlabeled target without requiring any explicit information. Experiments show that achieves 14.69% relative word error rate reduction (WERR) AESRC2020 accent dataset an 8.79% WERR Common Voice English dataset.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Unsupervised Multilingual Acoustic Model Adaptation for Nonnative Asr

Automatic speech recognition (ASR) is currently one of the main research interests in computer science. Hence, many ASR systems are available in the market. Yet, the performance of speech and language recognition systems is poor on nonnative speech. The challenge for nonnative speech recognition is to maximize the accuracy of a speech recognition system when only a small amount of nonnative dat...

متن کامل

Unsupervised acoustic model adaptation for multi-origin non native ASR

To date, the performance of speech and language recognition systems is poor on non-native speech. The challenge for nonnative speech recognition is to maximize the accuracy of a speech recognition system when only a small amount of nonnative data is available. We report on the acoustic model adaptation for improving the recognition of non-native speech in English, French and Vietnamese, spoken ...

متن کامل

Unsupervised Pretraining Encourages Moderate-Sparseness

It is well known that direct training of deep neural networks will generally lead to poor results. A major progress in recent years is the invention of various pretraining methods to initialize network parameters and it was shown that such methods lead to good prediction performance. However, the reason for the success of pretraining has not been fully understood, although it was argued that re...

متن کامل

Unsupervised Pretraining for Sequence to Sequence Learning

This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summariz...

متن کامل

BotOnus: an online unsupervised method for Botnet detection

Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics Letters

سال: 2023

ISSN: ['0013-5194', '1350-911X']

DOI: https://doi.org/10.1049/ell2.12823